Effects of Random Sampling on SVM Hyper-parameter Tuning
نویسندگان
چکیده
Hyper-parameter tuning is one of the crucial steps in the successful application of machine learning algorithms to real data. In general, the tuning process is modeled as an optimization problem for which several methods have been proposed. For complex algorithms, the evaluation of a hyper-parameter configuration is expensive and their runtime is speed up through data sampling. In this paper, the effect of sample sizes to the results of hyper-parameter tuning process is investigated. Hyperparameters of Support Vector Machines are tuned on samples of different sizes generated from a dataset. Hausdorff distance is proposed for computing the differences between the results of hyper-parameter tuning on two samples of different size. 100 real-world datasets and two tuning methods (Random Search and Particle Swarm Optimization) are used in the experiments revealing some interesting relations between sample sizes and results of hyper-parameter tuning which open some promising directions for future investigation in this direction.
منابع مشابه
Parameter Tuning via Kernel Matrix Approximation for Support Vector Machine
Parameter tuning is essential to generalization of support vector machine (SVM). Previous methods usually adopt a nested two-layer framework, where the inner layer solves a convex optimization problem, and the outer layer selects the hyper-parameters by minimizing either cross validation or other error bounds. In this paper, we propose a novel parameter tuning approach for SVM via kernel matrix...
متن کاملInvestigating Exploratory Capabilities of Uncertainty Sampling using SVMs in Active Learning
Active learning provides a solution for annotating huge pools of data efficiently to use it for mining and business analytics. Therefore, it reduces the number of instances that have to be annotated by an expert to the most informative ones. A common approach is to use uncertainty sampling in combination with a support vector machine (SVM). Some papers argue that uncertainty sampling performs b...
متن کاملPractical selection of SVM parameters and noise estimation for SVM regression
We investigate practical selection of hyper-parameters for support vector machines (SVM) regression (that is, epsilon-insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than re-sampling approaches commonly used in SVM applications. In particular, we describe a new analytical prescription for s...
متن کاملOn the overestimation of random forest’s out-of-bag error
Background The ensemble method random forests has become a popular classification tool in bioinformatics and related fields. The out-of-bag error is an error estimation technique which is often used to evaluate the accuracy of a random forest as well as for selecting appropriate values for tuning parameters, such as the number of candidate predictors that are randomly drawn for a split, referre...
متن کاملImage Classification Based on KPCA and SVM with Randomized Hyper-parameter Optimization
Image classification is one of the most fundamental and useful activities in computer vision domain. For better accuracy and executing efficiency under the circumstance of high dimensional feature descriptors in image classification, we proposes a novel framework for multi-class image classification based on kernel principal component analysis(KPCA) for feature descriptors post-processing and s...
متن کامل